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ABSTRACT 

We  study  a  routing  problem  that  arises  on  SIMD  parallel  architectures  whose  communication 
network  forms  a  toroidal  mesh.  Wre  assume  there  exists  a  set  of  k  message  descriptors  {(x,,y,)}, 
where  (x;,yt)  indicates  that  the  ith  message’s  recipient  is  offset  from  its  sender  bv  x,  hops  in  one 
mesh  dimension,  and  y,  hops  in  the  other.  Every  processor  has  k  messages  to  send,  and  all  processors 
use  the  same  set  of  message  routing  descriptors.  The  SIMD  constraint  implies  that  at  any  routing 
step,  every  processor  is  actively  routing  messages  with  the  same  descriptors  as  any  other  processor. 
We  call  this  Isomorphic  Routing.  Our  objective  is  to  find  the  isomorphic  routing  schedule  with  least 
makespan.  We  consider  a  number  of  variations  on  the  problem,  yielding  complexity  results  from 
O(k)  to  NP-complete.  Most  of  our  results  follow  after  we  transform  the  problem  into  a  scheduling 
problem,  where  it  is  related  to  other  well-known  scheduling  problems. 
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1  Introduction 


The  issue  of  routing  messages  in  a  parallel  computer  network  has  attracted  a  considerable 
amount  of  attention.  A  host  of  problem  variations  exist.  For  example,  some  models  presume 
that  every  processor  i  holds  a  number  and  that  one  wishes  to  implement  some  permutation 
(e.g.,  [6]).  Another  variation  is  to  assume  that  each  processor  i  has  a  list  of  messages  each 
of  which  is  destined  for  an  arbitrary  processor,  this  is  known  as  “all-to-ali  personalized 
communication”  [4].  Our  problem  is  a  constrained  case  of  all-to-all  personalized  communi¬ 
cation,  on  an  n  x  m  toroidal  mesh.  It  is  also  a  constrained  case  of  the  general  “compiled 
communication”  problem  studied  in  [1],  where  the  problem  is  to  construct  a  communication 
schedule  for  an  irregular  computation. 

To  begin  with,  in  our  problem,  we  can  always  describe  a  message’s  destination  in  terms 
of  the  offset  in  both  mesh  dimensions  X  and  Y  of  the  source  processor.  Thus,  a  pair  (x,y) 
describes  a  message’s  routing  requirements.  Observe  however  that  a  message  needn’t  travel 
exactly  x  units  in  the  X  dimension  and  y  in  the  Y — because  of  wrap-around,  it  may  equally 
well  choose  to  travel  to  -  x  units  in  X  and/or  n  —  y  units  in  Y .  Now  imagine  a  parallel 
computation  where  every  processor  performs  the  same  computation,  but  on  different  data. 
Further  suppose  that  the  pattern  of  messages  every  processor  sends  is  the  same,  e.g.,  pat¬ 
terns  associated  with  discretization  stencils  [7].  We  may  thus  describe  the  communication 
requirements  of  the  entire  computation  in  terms  of  the  offsets  {(zj.yi), . . . ,  (xk,  yk)}  of  the 
k  messages  a  single  processor  sends.  We  will  say  that  the  n  x  m  different  messages  with  a 
common  offset  pair  are  all  isomorphic. 

Every  processor  has  four  communication  ports,  referenced  as  North,  East,  West,  and 
South  (N,  E,  W,  and  S).  We  assume  the  communication  links  are  full-duplex.  We  are  inter¬ 
ested  in  SIMD  (Single  Instruction  Multiple  Data)  architectures,  where  processors  execute 
the  same  instruction  stream  in  lock-step.  Unless  the  architecture  provides  special  support 
for  local  indirect  addressing  (which  is  much  slower  even  when  provided),  an  implication  of 
SIMD  processing  is  that  at  every  instant,  the  set  of  messages  moving  through  all  ports  of 
a  common  type  (e.g.,  N)  are  isomorphic..  We  desire  a  routing  schedule  that  minimizes  the 
time  required  to  complete  the  communication,  i.e.,  the  makespan. 

We  will  examine  variations  of  the  problem,  finding  they  have  a  surprising  range  of 
complexities.  The  variations  derive  from  assumptions  concerning  how  many  communication 
ports  may  be  active  at  a  time,  and  whether  a  message  must  be  fully  routed  once  it  begins 
moving  or  if  it  can  be  temporarily  buffered  at  an  intermediate  processor.  The  assumptions 
and  associated  complexities  are  given  below. 

•  One  port  active  at  a  time:  O(k); 

•  All  ports  active,  temporary  buffering  allowed:  O(fclogAr); 
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•  All  ports  active,  no  temporary  buffering:  NP-complete. 

Let  us  now  state  the  problem  more  formally.  In  a  toroidal  mesh  of  n  x  m  processors,  a 
set  of  messages  M  =  {mi ,  m2, . . . ,  m*}  is  to  be  sent  from  each  of  the  n  x  m  processors  (the 
sources)  to  some  other  processors  (the  destinations).  Each  message  t nx  is  represented  by  a 
pair  of  integers  (x,,y,)  giving  the  relative  offset  of  its  destination  from  its  source.  Assume 
0  <  Xi  <  m  —  1  and  0  <  y,  <  n  -  1.  We  wish  to  design  a  schedule  so  that  all  messages  at 
each  processor  are  sent  to  (and  received  by)  their  destinations  in  the  minimum  amount  cf 
time.  Depending  on  the  problem  variation,  at  any  time  a  processor  can  send  one  message 
in  one  of  four  directions  (N,  E,  W,  or  S),  or  at  any  time  a  processor  can  send  up  to  four 
messages,  one  in  each  direction.  We  assume  it  always  takes  one  time  unit  for  a  message 
to  traverse  one  link.  We  notice  that  for  any  message  m,  =  (x,,  yt)  there  are  four  possible 
ways  to  send  it,  East  and  North  (xt, y,),  East  and  South  (x,,  — («  —  y,)),  West  and  North 
(— (m  —  Xi),yi),  and  finally,  West  and  South  (-(m  -  x^),  — (n  —  yi)).  Because  the  mesh 
is  toroidal,  they  all  reach  the  same  destination.  Depending  on  the  problem  variation,  we 
either  assume  that  a  message  must  be  routed  to  completion  in  a  successive  series  of  steps, 
or  that  a  message’s  movement  can  be  fragmented,  e.g.,  one  step  N,  two  steps  buffered,  one 
step  W,  another  step  N,  and  so  on. 

For  example,  in  a  2  x  3  toroidal  mesh  shown  in  FIG.  1(a),  3  messages  are  to  be  sent, 
they  are  mi  =  (1,0),  m2  =  (2, 1),  and  m3  =  (0, 1).  Assuming  that  all  ports  may  be  active 
simultaneously,  we  easily  determine  that  the  makespan  of  the  optimal  schedule,  denoted  by 
C*M,  is  2.  From  time  0  to  time  1,  each  processor  sends  mi  East  to  its  destination,  m2 
West,  and  m3  North  to  its  destination.  From  time  1  to  time  2,  each  processor  sends  m2 
North  to  its  destination.  The  schedule  is  illustrated  in  FIG.  1(b).  Under  our  assumption  of 
isomorphic  message  passing,  each  processor  does  exactly  the  same  thing  at  the  same  time. 
Any  time  a  processor  sends  out  a  message  on  one  port,  (e.g.,  N),  in  the  following  time  step 
a  message  isomorphic  to  it  is  received  on  the  opposite  port  (e.g.,  S),  save  that  one  unit  of 
routing  service  in  one  dimension  (e.g.,  Y)  has  been  given.  This  observation  suggests  that  we 
can  approach  the  scheduling  problem  in  terms  of  a  single  processor  giving  routing  service 
to  each  of  its  k  messages.  The  schedule  for  one  processor  can  be  shown  by  the  tiaditional 
Gantt  chart  as  in  FIG.  1(c). 
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(C) 


FIG.  1.  An  example. 


A  message  may  travel  either  direction  within  a  dimension.  This  allows  the  possibility  of 
schedules  that  cause  a  message  to  “backtrack”,  e.g.,  move  3  units  W,  and  later  move  4  units 
E.  In  the  case  when  temporary  buffering  is  provided  at  each  processor,  such  a  schedule  can 
always  be  improved  (at  least  not  degraded)  by  removing  the  backtracking  loop,  whence  if 
Cmax  >s  the  minimized  makespan  for  an  instance  of  the  isomorphic  routing  problem,  there 
exists  a  backtracking- free  schedule  with  cost  C^ax.  When  there  is  no  temporary  buffering, 
backtracking  may  be  needed  just  to  keep  a  message  moving  until  it  reaches  its  destination. 
In  the  remainder  we  will  confine  our  attention  to  backtracking-free  schedules. 

The  problem  defined  above  can  be  converted  to  an  equivalent  problem  similar  to  the 
open  shop  scheduling.  We  are  given  four  machines  E,  W,  N,  S,  in  which  E,  and  W  are 
identical  in  function  but  give  different  service  times,  as  do  N  and  S.  There  are  k  jobs, 
Jy ,  J?, . . . ,  ,/fc.  Each  job  7,  consists  of  two  tasks  and  Yy,  where  A,  can  only  be  executed 
by  E  or  W  (but  not  both,  because  there  is  no  backtracking),  taking  x,  or  m  —  x,  time 
units,  respectively,  and  Yy  can  only  be  executed  by  N  or  5,  taking  y,  or  n  -  yy  time  units, 
respectively.  The  integers  m,  n,  x.’s  and  yi's  are  as  defined  in  the  original  problem  above. 


3 


Tasks  Xi  and  Y,  cannot  be  executed  simultaneously  at  any  time,  but  may  be  broken  into 
unit-time  slices.  However,  in  some  problem  variations  a  job  Ji  may  be  suspended  once  it 
has  begun  execution,  a  feature  that  corresponds  with  a  message  being  buffered  en  route. 
All  task  execution  periods  occur  on  the  same  machine.  Our  goal  is  to  find  a  schedule  to 
execute  all  jobs  so  that  the  makespan  or  the  maximum  completion  time  (\nax  is  minimized. 

This  problem  definition  suggests  a  new  group  of  problems,  which  we  call  multi-operation/ 
multi-machine  scheduling  problems.  In  the  classical  multi-operation  model  (">],  each  job  re¬ 
quires  execution  on  more  than  one  machine.  In  an  open  shop  the  order  in  which  a  job 
passes  through  the  machines  is  immaterial,  whereas  in  a  flow  shop  each  job  has  the  same 
machine  ordering  and  in  a  job  shop  the  jobs  may  have  different  machine  orderings.  In 
the  multi-operation/multi-machine  model,  instead  of  having  just  one  machine  to  perform 
a  certain  kind  of  task  for  a  job,  there  is  a  back-up  machine  with  the  same  function  and  a 
possibly  different  cost. 

We  can  distinguish  the  situations  in  which  a  task  requires  identical  service  at  either 
common  function  machine,  or  has  different  service  requirements  that  depend  on  the  machine. 
Our  problem  is  a  special  case  of  the  latter.  In  particular,  we  assume  that  for  each  pair  of 
common  function  machines  there  exists  an  integer  c  (c  =  rn  for  N-S,  c  -  ?i  for  E-W)  such 
that  a  task  with  demand  x,  requires  at,  units  on  one  machine  and  c-  Xi  units  on  the  other. 
In  this  case  we  will  say  that  the  machines  give  complementary  service. 

In  the  remainder  we  will  refer  to  problem  variations  by  the  following  names. 

P] :  Only  one  machine  (out  of  all  four)  may  be  executing  at  a  time. 

Pi :  All  four  machines  may  execute  simultaneously,  jobs  may  be  suspended,  common  func¬ 
tion  machines  give  complementary  service. 

P31  AH  four  machines  may  execute  simultaneously,  jobs  may  not  be  suspended,  common 
function  machines  give  complementary  service. 

P4:  AH  four  machines  may  execute  simultaneously,  jobs  may  be  suspended,  common  func¬ 
tion  machines  give  uniform  service. 

P5:  All  four  machines  may  execute  simultaneously,  jobs  may  not  be  suspended,  common 
function  machines  give  uniform  service. 

Pi,  P2,  and  P3  have  meaning  in  the  context  of  the  isomorphic  routing  problem;  P4  and 
P5  are  natural  variations  of  the  multi-operation/multi-machine  scheduling  problem.  We 
will  establish  complexity  bounds  on  each  of  these  problems. 

We  organize  this  paper  as  follows.  In  Section  2,  we  study  the  complexity  of  all  the 
problems  above,  save  Pi.  P\  is  shown  to  be  0{k),  while  the  other  variations  are  shown  to 
be  NP-complete.  Section  3  develops  an  algorithm  for  problem  P2,  and  Section  4  develops 
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an  0(A:logfc)  implementation  of  the  algorithm.  Section  5  presents  our  conclusions.  The 
Appendix  proves  some  useful  lemmas  in  detail. 

2  Complexity  results  for  Pi,  P3,  P4,  and  P5 

Problem  P\  allows  only  one  machine  to  be  executing  at  a  time.  The  solution  is  trivial.  Step 
through  the  jobs  sequentially,  giving  exhaustive  service  to  one  task,  and  then  the  other,  in 
each  case  selecting  the  machine  which  serves  the  task  most  quickly.  2 k  comparisons  are  per¬ 
formed  in  the  course  of  selecting  machines,  giving  the  algorithm  complexity  0(k).  While 
not  very  interesting  in  the  scheduling  '•.ontext,  the  situation  follows  from  the  isomorphic 
routing  problem  under  the  constraint  that  at  any  step,  only  one  communication  port  can 
be  active.  This  is  a  seemingly  natural  constraint,  but  is  not  always  required.  For  exam¬ 
ple,  the  Thinking  Machines  CM-2  is  able  to  communicate  on  all  ports  simultaneously  [1], 
Indeed,  the  problem  studied  in  [1]  is  similar  to  ours,  in  that  it  seeks  to  schedule  communica¬ 
tion  (albeit  irregular,  as  opposed  to  our  isomorphic  assumption)  on  the  CM-2’s  hypercube 
communication  network. 

Next  we  show  that  P3,  P4  and  P5  are  NP-complete.  First  consider  P4,  where  common 
function  machines  give  uniform  service.  Assume  that  machines  M\,  M2  are  identical,  as 
are  M3,  M4.  There  are  k  jobs,  J\ ,  J2, . . Jk ■  Each  job  J{  consists  of  two  tasks  ,¥,  and  Yt, 
where  Xi  can  only  be  executed  by  Mi  and  M2,  taking  X{  time  units  on  either  machine, 
and  Y{  can  only  be  executed  by  M3  and  M4,  taking  y,  time  units  on  either  machine.  A 
job  may  be  suspended,  but  may  never  have  both  its  tasks  receiving  service  simultaneously. 
Our  goal  is  to  find  a  schedule  with  the  minimum  makespan  Cmax-  We  shall  next  prove 
that  whether  we  allow  preemption  of  tasks  or  not,  the  problem  is  always  NP-complete. 
Note  that  the  NP-completeness  of  this  formulation  (an  open  shop  scheduling  problem  of 
identical  back-up  machines  with  or  without  preemption)  implies  the  intractability  of  all 
general  multi-operation/multi-machine  scheduling  problems. 

Theorem  1  P4  is  NP-complete. 

Proof.  Consider  the  corresponding  decision  problem,  in  which  given  a  bound  B,  we  are 
asked  whether  there  is  a  schedule  with  Cmax  <  B.  For  any  instance  of  the  NP-complete 
problem  PARTITION  [2],  given  A  =  {a4,G2, . .  • ,  aj.}  (positive  integers)  we  construct  an 
instance  of  the  decision  problem,  in  which  there  are  k  +  2  jobs,  x;  =  a;  and  j/,  =  0  for  i  = 
1,2,.  ..,k,  xk+i  =  xk+2  =  5  ELi  «.  +  1  and  yfc+1  =  yk+2  =  0,  and  finally  B  =  E?=i  a,  +  1. 
We  claim  that  there  exists  A'  C  A  such  that  Y^a,eA'  °«  =  \  Ei=i  ai  iff  there  is  a  schedule 
with  C'max  <  B  for  the  instance  defined. 

If  there  exists  A'  C  A  such  that  Ea.e^'  a<  =  2  EL  1  ai  (f°r  notational  simplicity  assume 
that  A'  =  {<*1,02, . .  .,a/J),  then  we  can  construct  a  schedule  with  Cmax  =  B  as  shown  in 
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FIG.  2.  Even  though  the  schedule  constructed  does  not  preempt  any  task,  it  is  also  feasible 
for  the  instance  that  allows  preemption  since  non-preemption  is  considered  as  a  special  case 
of  preemption.  As  a  matter  of  fact,  the  schedule  in  FIG.  2  is  the  best  possible  since  for  any 
feasible  schedule  Cmax  >  \\  -^l  =  Z)Li  ai  +  1  =  B. 

1/2(B-1) 


FIG.  2.  A  schedule  with  Cmax  =  B  for  the  instance  of  the  decision  problem  of  f\ ■ 

If  there  exists  a  schedule  with  Cmax  =  fi,  then  the  two  big  tasks  Afc+i  and  Xk+2  cannot 
be  scheduled  on  one  machine  since  otherwise  -Fx*;+2  =  ot  +  2  >  B.  With¬ 

out  loss  of  generality,  assume  A/t+i  is  scheduled  on  M\  and  Xk+2  is  scheduled  on  M2.  For 
the  remaining  k  A-type  tasks  Aj, . . . ,  A*,  because  xi  —  ULi  ai  =  2/?  —  (xk+\  +  £*+ 2), 
Mj  and  M2  are  not  idle  from  time  0  to  time  B.  Without  loss  of  generality,  assume  tasks 
Ai,...,A/,  are  scheduled  on  Mi,  and  tasks  Aa+]  , . . . ,  A*  are  scheduled  on  M  2.  We  have 
xi  =  ZiL/i+i  xi  —  |  ]Cf=i  Of*  This  is  true  regardless  of  whether  preemption  of  tasks  is 
allowed  or  not.  So  there  exists  A'  =  {ai,. .  .  ,a/J  C  A  such  that  a«  -  \  a>-  * 

Now  let  us  consider  P5,  in  which  a  job’s  service  must  be  continuous,  and  common 
function  machines  give  uniform  service.  The  requirement  of  continuity  does  not  prohibit 
the  tasks  from  being  broken  into  slices  which  are  independently  scheduled,  so  long  as  a 
job’s  execution  is  not  interrupted.  It  is  easy  to  see  that  the  proof  of  Theorem  1  can  be 
used  without  any  change  to  prove  the  NP-completeness  of  problem  P5  in  both  cases  of 
preemption  and  non-preemption.  Thus  we  have  the  additional  result: 

Theorem  2  P5  is  NP -complete. 

Now  suppose  that  a  job’s  service  must  be  continuous,  and  that  common  function  ma¬ 
chines  give  complementary  service.  We  assume  that  a  task  can  be  broken  into  unit-time 
slices.  This  formulation  corresponds  directly  to  an  isomorphic  routing  problem  where  we 
require  that  once  begun,  a  message  continues  to  move  at  each  step  until  it  reaches  its 
destination.  It  turns  out  that  this  variation  is  also  intractable. 

Theorem  3  P3  is  N P-complete. 

Proof.  Consider  the  corresponding  decision  problem,  in  which  given  a  bound  P,  we  are 
asked  whether  there  is  a  schedule  with  Cmax  <  B.  For  any  instance  of  the  NP-complete 
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problem  PARTITION,  given  A  =  {<*]  ,02, . .  .  ,a,t}  (positive  integers)  we  construct  an  in¬ 
stance  of  the  decision  problem  as  follows.  Let  m  and  n  be  two  integers  much  larger  than 
Ylt=i  ai  +2-  Given  four  machines  M \,  M2,  M3,  A/4,  where  M\  and  M2  are  identical  in  func¬ 
tion  but  give  complementary  service  (x  and  m  -  x  for  workload  x  respectively),  as  do  M3 
and  M4  ( y  and  n  -  y  for  workload  y  respectively).  There  are  k  +  4  jobs,  ,  J2, . . . ,  Jk+A, 
each  of  which  consists  of  an  A'-type  task  and  a  V'-type  task.  Let  x,  =  at  and  y,  —  0  for 
i  =  1,2 ,...,k,  x,+l  =  m  -  |  and  yk+1  =  i^=I  a,  -I-  1,  xk+2  =  m  -  5  ELi  «. 

and  t/fc-t- 2  =  n  -  \  ai  ~  G  xk+3  ~  1  and  Uk+ 3  =  |!£*=iGi,  xk+ a  =  m  -  1  and 
yk+4  —  n  ~  2  HiLi  ai •  Finally,  let  B  =  a,  +  1.  We  claim  that  there  exists  A'  C  A  such 
that  J2a,£A' ai  —  jELi  >ff  there  is  a  schedule  with  C'max  <  5  for  the  instance  defined. 

If  there  exists  A'  C  A  such  that  EE.g/ v  ai  —  2  ai  (f°r  notational  simplicity,  assume 
A '  =  {ai,d2,. . . , ak}),  then  we  can  construct  a  schedule  with  C'max  =  B  as  shown  in  FIG. 
3.  As  a  matter  of  fact,  the  schedule  in  FIG.  3  is  the  best  possible  since  for  any  feasible 
schedule  Cmax  >  [5  E,=j4(ntin{x,,  m  -  x,}  +  min{yt,  n  -  y,})]  =  £i=i  ai  +  1  =  B. 


M, 

M2 

m3 
m4 

0  i/2(B-l)  +1  B 

FIG.  3.  A  schedule  with  C’max  =  B  for  the  instance  of  the  decision  problem  of  P3 . 

If  there  exists  a  schedule  with  Cmax  -  B ,  then  ATj,  A'2, . . . ,  Xk  and  Xk+3  must  be  sched¬ 
uled  on  Mi,  Xk+\ ,  Xk+-2,  Xk+4  on  M2,  V/;fj ,  Yk+ 3  on  M3,  and  Yk+2,  Yk+4  on  A/4.  Since  A\.+i 
and  Yk+i  can  not  be  executed  simultaneously,  M2  executes  AE+i  at  the  same  time  M3  exe¬ 
cutes  Vjc+3.  So  we  say  that  the  executions  of  AE+i  and  Vfc+3  are  completely  parallel.  Since 
the  executions  of  Xk+3  and  Yk+ 3  are  continuous,  so  are  the  executions  of  Xk+3  and  AT k+  j . 
Similarly,  we  can  show  that  the  executions  of  Xk+4  and  Xk+2  are  also  continuous.  How  can 
the  schedule  have  Xk+3  on  M\  and  Xk+i,  Xk+2,  and  Xk+4  on  M2  such  that  the  continuity 
of  Afc+3  and  Xk+\  and  the  continuity  of  Xk+4  and  Xk+2  are  both  respected?  It  is  not  hard 
to  see  that  Xk+3  must  be  scheduled  from  time  |  J2i=i  «<  to  time  5  JlLi  «.  +  1.  Therefore 
set  {X\,  X2,  ■  ■  -  ,Xk}  is  divided  into  two  sets  of  equal  sums.  So  there  exists  A'  C  ,4  such 
that  EE, 6,4'  =  |  Ei=i  ai ■  B 

We  are  left  now  with  the  problem  of  analyzing  P2.  This  will  require  most  of  the  remain¬ 
der  of  the  paper.  Our  approach  will  be  to  recognize  that  P2  is  a  variation  on  a  scheduling 
problem,  denoted  by  P'2 ,  where  the  decision  of  which  machine  to  use  for  any  given  task  is 
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a  given  input  parameter;  the  sign  of  x ,  or  y,  determines  which  machine  to  use.  In  the  iso¬ 
morphic  routing  problem  this  is  equivalent  to  specifying  the  specific  directions  the  message 
must  travel.  This  might  arise,  for  instance,  if  the  N  and  E  ports  could  be  used  only  for 
sending  messages,  whereas  the  S  and  W  ports  could  be  used  only  for  receiving  them. 

Assuming  that  machine  usage  is  pre-specified,  the  resulting  problem  P'2  is  related  to  a 
paper  by  Gonzalez  and  Sahni  [3].  The  paper  studies  the  general  open  shop  scheduling  with 
preemption,  and  proves  that  C‘nax  —  a  =  nia Lj),  where  T,  is  the  sum  of  execution 
time  of  all  tasks  scheduled  on  machine  Ml  and  L}  is  the  sum  of  execution  time  of  all  tasks 
of  job  J2.  To  construct  the  optimal  schedule  for  any  instance  with  m  machines,  n  jobs,  and 
r  nonzero  tasks,  an  0(r{min{r, m2}  +  mlogn))  algorithm  is  presented. 

It  is  easy  to  see  that  P2  is  in  fact  a  special  case  of  this  open  shop  scheduling  problem,  in 
which  parameters  are  integers  and  preemptions  are  only  allowed  at  the  integral  points.  Fur¬ 
thermore,  we  also  notice  that  the  minimum  makespan  for  any  instance  of  P2 ,  C’naI,  is  at  least 
o  =  ma  Xij{Tj,Lj}  =  max{X]vx,>o  5Zvy,>o  J/t ),  niax_,{|x,j  + 

When  we  apply  Gonzalez  and  Sahni’s  algorithm  to  P'2,  we  have  an  optimal  pre¬ 
emptive  schedule  with  C^ax  =  a.  Since  all  preemptions  occur  at  the  integral  points,  this  is 
actually  the  optimal  solution  to  P2.  The  time  complexity  of  Gonzalez  and  Sahni’s  algorithm 
when  applied  to  P2  is  <9 1  A;  log  A: ) . 

In  view  of  this  result,  our  approach  will  be  to  take  a  problem  instance  of  P2 ,  and 
determine  the  machine  assignments  that  minimize  the  o.  Gonzalez  and  Sahni’s  algorithm 
may  then  be  applied  to  construct  the  actual  schedule. 

3  An  algorithm  for  P2 

As  pointed  out  in  last  section,  solving  P2  can  reduced  to  the  problem  of  finding  the 
task-to-machine  assignment  that  minimizes  the  makespan.  The  actual  schedule  can  then  be 
determined  in  O^logfc)  time  using  the  algorithm  of  Gonzalez  and  Sahni.  In  this  section 
we  develop  an  algorithm  that  makes  the  needed  assignment. 

We  abstract  our  problem  as  follows.  We  are  given  two  sets  of  items,  X  =  {Xj,  X2,  . . ., 
Xk},  and  Y  =  {Vj ,  Y2, . . Yk},  and  nonnegative  integers  x2, . .  J/i, 3/2,-  •  -,yk,  m, 
and  n,  where  2,  <  to  -  1  and  ?/,-  <  n- 1  for  all  i’s.  We  must  define  a  function  F  :  XUY  —  A7 
with  F(X{)  -  x,  or  m  -  2 and  F(K)  =  yi  or  re  -  yx  such  that  0  =  maXiyfT,,  Lj}  = 
maxjc*!, 0*2,013}  is  minimized,  where 

a,  =  max{  5Z  x «>  YL  f m  ~  x«)) 

VHF{Xi)=xt)  Vi(P(X,)=m-r.) 

ft2  =  max{  Yl  y.,  Y  (n~y.)} 
v.(F(K,)=y,)  Vt(F(K)=n-Vl) 
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«3  -  niax{F(A',)  +  /’(V,,} 

Vi 

We  first  describe  an  algorithm  A  that  defines  a  function  /  :  X  U  Y  — *  A  with  /( A’, )  =  x, 
or  m  -  x*,  and  /( V', )  =  y ,  or  n  -  y,  such  that  the  resulting  o-i  and  a2  are  both  minimized. 
Following  this,  we  look  at  how  F  may  deviate  from  /,  and  show  how  to  modify  /  so  as  to 
create  F. 

Algorithm  A 

1 .  Sort  xt ,  x2, . . . ,  ik  and  y\ ,  y2, . . . ,  jr*..  in  nondecreasing  order,  separately. 

2.  Do  the  following  to  each  sorted  list.  The  pseudo-code  below  defines  f\  :  A  —  A’  with 
f{X,)  =  x,  or  m—Xi  such  that  the  resulting  Qj  is  minimized.  To  define  fy  :  V'  — *  Ar  for 
the  minimum  a2,  we  simply  replace  the  notations  for  the  A-list  by  the  corresponding 
notations  for  the  K-list. 

For  notational  simplicity,  assume  xj,x2,...,xfc  are  in  nondecreasing  order: 
af  *—  0;  af  <—  0;  i  <—  1;  jr  ■* —  k\ 
while  i  <  j  do 

if  aj-  +  x,  <  ajf  +  (jn  -  x.) 

then  {  fx(Xi)  —  x,;  at  at  +  x,;  i  -i-  +  } 

else  {  fx{  Xj)  —  m-Xj]  of  of  +  (m  -  x});  j  +  +  }: 

«i  *-  max{Qjl',oj'}; 

3.  /  :  A’  U  Y  — ►  Ar  is  the  combination  of  fx  and  fy. 

We  recognize  at  as  accumulating  the  first  term  in  Qi,  and  of  as  accumulating  the 
second.  Given  the  sorted  ordering  of  the  x,’s,  the  algorithm  finds  a  turning  point  f,  where 
/(A,)  =  x,  for  i  <  t,  and  /(A',)  =  m  -  x,  for  i  >  t ;  furthermore,  among  all  such  turning 
points  the  one  chosen  minimizes  max{o^,of  }.  That  this  algorithm  defines  o,  follows  from 
the  fact  that  the  optimal  schedule  must  have  this  structure,  for  suppose  not.  Assume  there 
are  p  and  q  with  1  <  p  <  q  <  k  such  that  /( Xv)  —  m  -  xp,  and  f(Xq)  =  xq.  Since 
xp  <  x,  and  m  -  xp  >  m  -  xq,  it  follows  that  max{m  -  xp, x,}  >  max{xp. m  -  x?},  so  that 
changing  the  assignment  for  Ap  and  Xq  does  not  increase  oj.  We  may  apply  this  argument 
repeatedly  until  the  resulting  assignment  exhibits  a  turning  point,  as  claimed. 

FIG.  4  shows  an  example  of  using  algorithm  A  to  compute  the  optimal  value  of  o, .  The 
numbers  in  the  circles  are  the  values  of  function  fx  of  the  corresponding  tasks.  W’e  also 
illustrate  of  and  af  as  functions  of  index,  even  though  the  algorithm  will  not  generate 
all  such  values  we  display.  From  now  on,  we  shall  use  the  diagrams  similar  to  FIG.  4  but 
without  the  af ,  af  values  to  represent  the  definition  of  /,  which  we  will  also  call  assignment 
diagrams. 
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FIG.  An  example  of  using  algorithm  A  to  compute  it*. 


We  see  from  the  above  discussion  that  oj  and  a the  optimal  values  of  O’!  and  02,  can 
he  obtained  by  algorithm  A  in  time  O(fclogA-),  while  03,  the  optimal  value  of  03,  can  be 
obtained  by  choosing  min{ar,, m  —  xt}  for  task  A',  and  min{j/,, ra—  y,}  for  task  Yx.  However, 
the  difficulty  we  face  is  that  these  optimalities  may  not  be  achieved  at  the  same  time,  i.e., 
the  assignment  minimizing  o^  and  ctj  may  not  be  consistent  with  the  assignment  minimizing 
03-  To  highlight  the  differences  we  will  say  that  /(A-,)  (alt.,  /(Vj))  is  a  bad  choice  if  /(A’t)  ^ 
min{x,, m  -  a:,}  (alt.,  f(Yt)  ^  minjy,,  n  -  yi}),  and  that  f(X, )  fait.,  f(Y,))  is  a  disastrous 
choice  if  f(Xt)  ^  min{x,,  m  -  x,}  (alt,  f(Yt)  ^  min{yt,  n  -  })  and  /(A',)  +  f(Yi)  >  a*, 

where  a*  =  max{o*,  o.J,  03}.  In  the  example  in  FIG.  4,  the  shaded  circles  represent  the  bad 
choices.  It  is  easy  to  see  that  bad  choices  always  form  a  contiguous  block  which  includes 
the  turning  point.  Without  loss  of  generality,  assume  that  the  block  of  bad  choices  is  in 
the  left  column  of  the  assignment  diagram  and  ends  at  the  turning  point.  We  observe  that 
if  /  contains  no  disastrous  choices,  then  a"  =  max{o(,  a*2, 03}  >  max{oj,  cvfj}  >  <*3,  and 
F  =  /.  Should  /  contain  disastrous  choices,  we  need  to  consider  modifying  it  in  order  to 
find  the  function  F  with  the  minimum  a. 

Let  us  assume  then  that  we  have  computed  an  assignment  /  by  applying  Algorithm  A 
to  the  X  list  (and  so  find  the  X  assignment  function  /x),  and  to  the  Y  list  (and  so  find 
the  Y  assignment  function  /y),  and  have  identified  at  least  one  disastrous  choice.  /  may  or 
may  not  be  the  optimal  assignment  F.  We  have  developed  a  number  of  results  that  help  us 
to  identify  jobs  ./,  for  which  it  may  be  possible  that  /( A\)  ^  F’(A,)  or  f{Yt)  ^  F(Yi).  Most 
importantly,  these  results  severely  constrain  the  number  of  tasks  whose  assignment  in  /  can 
differ  from  their  assignment  in  F.  Given  /,  we  will  identify  a  set  of  possible  assignment 
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switches  to  consider;  the  least  cost  assignment  among  these  will  he  the  optimal  assignment. 
We  show  that  for  the  given  /,  only  0(k  log  k)  alternative  assignments  must  be  considered, 
whence  the  optimal  assignment  is  found  in  0(k\ogk)  time. 

We  proceed  now  by  making  some  definitions,  and  stating  certain  results  founded  upon 
them  (proofs  are  relegated  to  the  Appendix).  Without  loss  of  generality,  we  assume  m  >  u 
in  the  remainder. 

Given  /,  let  By  and  By  be  the  sets  of  bad  choices  in  fx  and  fy,  respectively,  and  D. y 
and  Dy  he  the  sets  of  disastrous  choices  in  fx  and  fy ,  respectively.  Now.  in  the  assignment 
diagrams  of  fx  and  fy ,  let  Xi  and  \'i  be  the  sets  of  choices  in  the  left  columns  of  fx  and 
fy ,  respectively,  and  A'k  and  Yr  be  the  sets  of  choices  in  the  right  columns  of  fx  and 
fy ,  respectively.  We  denote  the  sets  of  tasks  whose  assignment  differs  under  /  and  /•'  as 
Ux  C  A'/.,  Vx  C  A'/?,  Uy  C  V'L,  Vy  C  YR.  We  use  aA{Ux.Vx)  (alt.  a-2(Uy.\y))  to  denote 
the  corresponding  c*i  (alt.,  02)  resulting  from  the  switches  in  U\ ,  Kv  (alt.  Vy,  Vy).  Finally, 
we  will  say  that  assignment  /(A',)  (alt.,  f(Y,))  is  a  potential  switch  if  either  /(A',)  (alt.. 
f(Yi))  is  adisastrous  choice,  or  /(A',)  (alt.,  f{Yx))  is  a  bad  choice  while  /(V',)  (alt.,  /( .V,))  is 
in  Vy  (alt.,  Vx),  and  f(Xi)  +  n-  f(Y,)  >  a2(Uy,Vy)  (alt.,  m-/(A'I)  +  /(Vl)  >  n, {(•'*.  V,v))- 

The  next  three  results  serve  to  constrain  the  number  of  switches  we  must  consider. 

Lemma  1  lf\Bx  \  >  3,  then  F  =  f. 

Lemma  2  \Dy  \  <  2. 

Lemma  3  \Ux  \  >  | Vx ]  and  \Uy\  >  |VV|>  • 

Lemma  4  All  members  of  U x  and  Uy  are  potential  switches. 

Now  consider  the  implications  of  these  results.  By  Lemma  1  we  only  have  to  worry  about 
situations  when  \Bx  \  <  2.  By  Lemma  4  we  know  that  Ux  contains  only  potential  switches, 
which  are  recognizable  bad  choices.  There  are  at  most  4  different  combinations  of  changing 
or  not  changing  the  assignments  of  bad  choices  in  the  left  column  of  fx-  By  Lemma  3  we 
know  that  at  most  two  assignments  in  the  right  column  of  fx  may  change.  For  each  fixed 
combination  of  changes  to  fx' s  left  column  we  need  consider  no  more  than  0(( ij))  pairs  of 
possible  changes  to  assignments  in  fx' s  right  column.  We  also  need  to  consider  possible 
changes  to  fy .  Lemma  2  tells  us  \Dy\  <  2;  Lemma  3  tells  us  |VV|  <  |f/y|;  Lemma  4  tells  us 
that  Uy  may  contain  only  potential  switches,  which  again  are  either  disastrous  choices  in  fy , 
or  bad  choices  f(Yi)  with  /(A",)  €  V\.  It  follows  that  |Vy|  <  |f/y|  <  \Dy\  +  |Vx|  <  4.  This 
means  that  for  every  fixed  combination  of  switched/non-switched  assignments  of  potential 
switches  in  the  left  column  of  fy ,  we  need  consider  no  more  than  all  switched/non-switched 
combinations  of  four  good  choices  from  the  right  column  of  fy .  There  are  0((Jj))  of  these. 
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Considering  all  combinations  of  possible  changes  to  /*  and  possible  changes  to  fy  requires 
time  0((j)  •  (4))  =  0(kG). 

We  describe  this  algorithm  for  problem  P2  formally  as  follows. 

1.  If  to  <  n,  rotate  the  mesh  by  90  degrees,  and  redefine  the  parameters  in  the  new 
coordinate  system. 

2.  Use  algorithm  A  twice  to  define  /  which  minimizes  max{ai,ft2) 

3.  If  the  block  of  bad  choices  in  an  assignment  diagram  is  not  in  the  left  column,  rotate 
the  diagram  by  180  degrees,  and  exchange  the  roles  of  the  two  machines  involved  in 
the  assignment. 

4.  If  there  are  no  disastrous  choices  in  both  fx  and  fy ,  let  F  he  f  and  go  to  step  6. 
Otherwise  continue  in  step  5. 

5.  List  all  possible  definitions  of  L'x,Vx  and  Up,Vp.  For  each  possible  combination  of 
Ux,  Vx,  Up,  Up,  compute  its  a.  Let  F  be  the  function  determined  by  the  Ux,  Vx,  Up, 
V'p  w'hich  together  result  in  the  smallest  a. 

6.  Use  Gonzalez  and  Sahni’s  algorithm  to  construct  the  schedule  with  Cmax  =  a. 

In  this  algorithm,  steps  1,  3,  and  4  each  take  0(k)  time,  while  steps  2  and  6  each  take 
O(fclogfc)  time.  We  also  know  that  for  step  5,  even  if  we  use  the  brute-force  method  of 
checking  all  possible  combinations  of  Ux,  Ux,  Up,  Vp,  the  time  needed  is  still  polynomial, 
0(ke).  In  the  next  section,  we  shall  show  that  step  5  can  in  fact  be  implemented  in  time 
O(Hogfc),  thus  yielding  an  O(fclogfc)  algorithm  for  P2. 

4  An  O(klogk)  implementation  of  the  algorithm 

The  previous  section  demonstrated  that  the  routing  problem  has  polynomial  complexity.  We 
can  drive  the  asymptotic  complexity  to  O(klogk),  but  at  the  price  of  tremendous  compli¬ 
cation  in  the  algorithm.  Our  results  may  be  primarily  of  theoretical  interest;  our  algorithm 
can  be  implemented,  but  suffers  from  a  lack  of  elegance.  One  hopes  that  additional  work 
on  the  problem  may  yield  a  more  intuitive  solution. 

Let  us  now  consider  the  following  three  cases:  jflxl  =  0,  \Bx\  =  I,  and  \Bx\  —  2, 
We  shall  prove  that  in  each  case  the  function  F,  which  minimizes  a,  can  be  obtained  in 
O(k]ogk)  by  switching  some  assignments  in  the  function  f.  We  will  use  the  next  three 
lemmas  to  help  reduce  the  number  of  possible  combinations  we  must  consider.  Their  proofs 
can  be  found  in  the  Appendix. 

Lemma  5  lf\Bx\  =  0,  then  |f/x|  =  |Vx|  =  0  and  |Z?p|  <  2. 
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Lemma  6  //  \Bx\  -  1,  then  |Vx|  <  | Ux\  <  1  and  |Oy|  <  2.  Furthermore,  if  Dy  = 
{f{Y\),f(Y2)},  then  a*  <  /(Aj)  +  f(X 2)  and  one  o/  /(A1!)  and  /( A'2 )  is  the  largest  bad 
choice  in  f\ . 

Lemma  7  If\Bx  \  =  2,  </ien  |£>xl  <  1  end  |Dy|  <  1.  Furthermore,  if  D\  -  {/( A'i)},  then 
Dy  -  {/(Vi)};  if  Dy  =  { /( Vi ) }  and  /( Ai)  £  Bx,  then  /( Aj)  must  k  in  the  right  column 
in  the  assignment  diagram  of  fx  ■ 

We  first  consider  Case  1:  |f?xl  =  0. 

By  Lemma  5,  Ux  -  Vx  =  <t>-  Since  Vx  —  4>,  only  disastrous  choices  in  fy  can  be 
potential  switches  for  Uy.  We  consider  two  subcases:  (a)  |Oy|  =  1;  and  (b)  |/)y|  =  2, 

(a)  If  Dy  =  {/(Ft)},  then  /(Vj)  is  the  only  potential  switch  in  fy.  Consider  the 
following  possible  combinations  of  Ux,  Vx  and  Uy,  Vy ,  each  of  which  determines  a  feasible 
definition  of  F,  and  choose  the  one  with  the  smallest  a  to  be  F.  The  entire  process  takes 
O(k)  time. 
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(b)  If  Dy  -  {f{Yi),  f{Y2)},  then  /(Fi)  and  /(V2)  are  the  two  potential  switches  in  fy. 
Without  loss  of  generality,  assume  /(A'i)  +  /( Vi )  >  /(A2)  +  /(F2).  This  means  that  if 
\Uy  \  =  1,  it  must  contain  f{Y\),  not  f{Y2).  Consider  the  following  feasible  definitions  of  F. 


# 

Ux 

Vx 

Uy 

Vy 

Time 
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2 

<t> 

{/(*)} 

4> 

0(1) 

3 

<t> 

{/(*)} 

{/(K)},V/(K)gVr 

0(k) 

4 

<t> 

4> 

4> 

0(1) 

5 

4> 

<t> 

{f(Yi),f(Y2)} 

{f(Yi)},Vf(Yi)eYR 

0(k ) 

6 

4> 

<t> 

{/(*), /(F2)} 

{fmJ(Yi)hVf{Y&f{Yj)€YR 

0(k  log  k) 

In  the  sixth  situation,  if  we  check  all  combinations  of  f(Yj)  G  YR  for  Vy,  there 

will  be  0(k2)  possibilities.  However,  not  all  combinations  need  to  be  examined.  Our  goal  is 
to  choose  f(Yj)  G  YR  for  each  fixed  f(Yt)  G  Yr  so  as  to  minimize  m&x{a2(Uy  ,Vy),  f(Xj)  + 
n  -  f(Yj)},  where  a2(Uy,  Vy)  -  a‘2  +  2n  -  f(Y\)  ~  f(Y2)  -  /(V,)  -  f(Y}).  First,  sort  in 
time  O(fclogfc)  all  f{Yj)  G  Yu  according  to  the  value  /( X3)  +  n  -  f(Yj)  nondecreasingly. 
Then  in  the  sorted  list  discard  those  choices  no  greater  than  their  left  neighbors,  yielding 
a  list  of  f(Yj)' s  sorted  by  nondecreasing  f(Xj)  +  n  -  /(F;)  and  nonincreasing  a2  +  2n  - 
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/(V,)  —  /(V2)  -  f(Y,)  -  f(Yj)  for  any  fixed  /(V,).  This  step  takes  0(k)  time.  Finally,  for 
each  fixed  /(V,),  perform  a  binary  search  in  the  list  to  locate  the  /(}))  with  the  minimum 
max{aj  +  2  n  -  /(V,)  -  f(Y2)  -  f(Y,)  -  f{Y}),f[Xj)  +  n-  /(>))}.  taking  0(k\ogk)  for  all 
/( V,)’s.  We  can  then  check  the  [Vr|  feasible  definitions  of  F  with  V'V  =  {/( Vt),  f(Y})}  as 
defined  above. 

Case  2:  \BX\  =  1. 

By  Lemma  6,  when  Dy  =  {/( Vi),/(VT)},  one  of  /(AT)  and  /(AT)  must  be  a  bad  choice 
in  fx,  which  implies  that  it  is  a  disastrous  choice.  Therefore,  if  |Dy|  =  2,  then  |Dxj  =  1. 
We  consider  four  subcases:  (a)  |D,y|  =  0  and  |Dy|  =  1:  (b)  |Dx|  =  1  and  1  Dy |  =  0;  (c) 
|Dx|  =  1  and  |Dy|  =  1;  and  (d)  \DX\  —  1  and  |Dyj  =  2.  We  notice  that  in  any  situation 
with  Vx  =  />,  only  disastrous  choices  in  fy  can  be  potential  switches  for  Uy,  and  that 
whether  \Dy  \  =  0  or  1  or  2,  we  can  use  the  same  method  as  in  Case  1  to  determine  F  in 
0{k\ogk)  time.  Let  us  now  assume  |Vy|  -  1,  i.e.,  Vx  =  {/(A’,)},  V/(X,)  6  A'r,  which  also 
implies  Ux  =  Bx. 

(a)  If  Dx  =  d>,  and  Dy  -  { /( VT )} ,  then  f(X\)  can  not  be  a  bad  choice.  Assuming 
Bx  -  {/(AT)},  we  have  UX  =  {/(AT)}-  Because  /(AT)  is  a  potential  switch  that  is  not  a 
disastrous  choice,  we  have  /( Y2 )  €  Vy  and  f{Y\)  6  Uy.  Note  that  /(V,)  may  also  be  in  Uy 
if  f[Y{)  is  a  bad  choice.  Consider  the  following  feasible  definitions  of  F. 


# 

Uy 

Vy 

Time 

1 

{/(V,)} 

{/(VT)} 

O(k) 

2 

{/(Vi),/(Vi)},t?U 

{/(VT)} 

O(k) 

3 

{/(Vi),/(V,)},i  ^  1 

{/( V2), /( V))},  v/(  v,)  e  Vr,/  /  2  1 

0(k  log  k) 

In  the  second  and  third  situations,  we  only  need  to  check  those  feasible  definitions 
of  F  with  Vx  =  {/(A',)},  for  which  /(V)  e  By  and  to  -  /(AT)  +  /(V)  >  ct\(Ux,Vx)  = 
o*  +  to- /(AT)  —  /(AT)-  In  the  third  situation,  we  can  avoid  checking  all  0(k 2)  combinations 
of  /(AT)  €  Xr  with  i  1  and  /(V,)  €  V/?  with  j  ^  2  by  using  the  same  method  developed 
in  the  sixth  situation  of  subcase  (b)  in  Case  1. 

(b)  If  Dx  =  {f(X i)},  and  Dy  =  0,  then  D*  =  {/(AT,)},  and  V*  =  {/(A',)}, V/( AT)  € 
Xr.  Consider  the  following  feasible  definitions  of  F. 


# 

Uy 

Vy 

Time 

1 

* 

4> 

0(*) 

2 

{/(V.)} 

4> 

O(fe) 

3 

{/(K)} 

{f(Yj)}y/(Yj)  e  V/? 

O(fclog£) 

In  the  second  and  third  situations,  we  only  need  to  check  those  feasible  definitions  of 
F  with  Vx  =  {/(X,)},  for  which  /(V,)  €  By  and  to  -  /(X,)  4-  f(Yt)  >  ct\(Ux,VX)  = 
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ftj  +  to  -  /(A  j )  -  f(Xi).  In  the  third  situation,  we  use  a  method  similar  to  that  in  the  sixth 
situation  of  subcase  (b)  in  Case  1  to  avoid  checking  all  0(k2)  combinations  of  /(A',)  G  A'/* 
and  f{Yj)  G  V'r.  The  only  differences  are  that  a-2(Uy,Vy)  =  <*2  +  "  ~  /(V.)  -  /{>'_,)  and 
that  if  /(Ki)  G  Vr  we  use  to  -  /(A'i)  +  n  -  /(Vi)  instead  of  /( A'i)  +  ?/  -  /(>',)  for  choice 
/(Ki)  in  the  sorting  part. 

(c)  If  D%  —  {/( A’i )},  and  Dy  —  {/(V),)},  where  h  is  a  fixed  index  equal  to  or  not  equal 
to  1,  then  U\  -  { f{X\ )},  Yx  =  {/(X,)},  V/(A',)  G  Xr.  Note  that  /()',)  G  Uy  only  when 
/(Vi)  G  By  and  to  -  /(A-,)  +  /(V)  >«i(CasKv)  =  of  +  to  -  /( A'i )  -  /( A', ).  Consider  the 
following  feasible  definitions  of  F. 


# 

Uy 

Vy 

Time 

1 

4> 

4> 

O(k) 

2 

if(Yi)} 

O(k) 

3 

{/(Vi)} 

{f{Yj)},'if{Yj)  G  Yr 

0(k\ogk) 

4 

{/(Yh)} 

<t> 

O(k) 

5 

mm 

{/(VS)},V/(>y)  G  YnJ^i 

0(k  log  k) 

6 

{f(Yk)J(Yi)},i^h 

4> 

O(k) 

7 

{f(Yk)J(Yi)},i?h 

{/(VS)},V/(Vy)  G  Yr 

0(k\ogk) 

8 

{f(Yh),f(Yi)},ith 

mYjhfwmnmttYdeYR 

O(k)ogk) 

The  reason  why/  £  i  in  the  fifth  situation  is  that  if  /(Vi)  G  Yr  and  we  let  Vy  -  {/(  V',)}, 
then  to  -  f(Xi)  +  n  -  /(Vi)  >  /( X,)  +  /(Vh)  >  max{/(Xi )  +  /(Vi ),  /( Xh)  +  f(Yh )},  which 
indicates  the  resulting  assignment  is  even  worse  than  the  original  assignment  without  any 
switches.  The  method  used  in  the  sixth  situation  of  subcase  (b)  in  Case  1  can  be  applied  to 
the  third,  fifth  and  seventh  situations  in  this  subcase  to  achieve  the  0[k\ogk)  bound.  In  the 
eighth  situation,  if  we  check  all  combinations  of  f{Xt)  G  Xr  and  f(Yj),f(Yi)  G  Yr,  there 
will  be  0(k3)  possibilities.  We  will  show  that  not  all  combinations  need  to  be  examined. 
Our  goal  is  to  choose  f(Y3),f(Yt)  G  Yr  for  each  fixed  /( A",)  £  Xr  so  as  to  minimize 
ma  x{a2(Uy,Vy),  f(X})  +  n  -  f(Yj),f(Xi)  +  n  -  /(V/)},  where  «2(f'V,Vy)  =  «2  +  2?f  - 
f(Y\)-f(Yi)-f(Yj)-f(Yi).  Without  loss  of  generality,  assume  f(Xj)  +  n- f(Yj)  >  /(X/)  + 
n~f(Yi ).  First,  sort  in  0(k\ogk)  time  /( Y] )  G  Yr  according  to  the  value  f(Xj)  +  n-f(Yj) 
nondecreasingly.  Second,  in  the  sorted  list,  for  each  /(Vj),  except  the  first  one,  let  / ( V/) 
be  the  largest  choice  among  those  on  the  left  side  of  /( V7).  This  can  easily  be  done  in 
0(k)  time.  Now,  we  have  a  list  of  \Yr\  -  1  choice  pairs  f{Yj),f(Yi)  ordered  according  to 
the  value  /(X7)  +  n  —  f{Yj)  nondecreasingly.  Third,  in  0(k)  time  discard  those  pairs  with 
their  sum  f(Yj)  +  f(Yi)  no  greater  than  that  of  their  left  neighbors  in  the  list.  Finally, 
for  each  /(X,)  G  Xr,  use  binary  search  to  find  the  pair  /(>/),/( V})  with  the  minimum 
max {«;  +  2 n  -  /(  V, )  -  /( Yt)  -  /( Yj )  -  /( Y,\ /( X,)  +  n-f(  V' ),  /( X, )  +  »  -  /( V, ) }  among 
the  remaining  pairs  in  the  list,  which  altogether  takes  O(k\og  k)  time.  In  the  above  process. 


if  f(Y\)  €  Yr ,  use  to  -  f(X\ )  +  n  -  /  ( V'i }  instead  of  /( A'i)  +  n  -  /(Vj )  for  choice  /( Vi )  in 
the  sorting  part. 

(d)  If  =  {/(A',)},  and  Z?y  =  {/W ), /( V2)},  then  by  Lemma  6 /(A,)  +  /(A2)  >  o*. 
and  /(A'i)  and  /  ( A  2 )  are  in  the  different  columns  of  /x-  Since  f(X\)  is  the  bad  choice, 
then  /(A2)  e  Ah-  By  assumption  Ux  =  {/( Aj)},  and  Vx  =  {/( A,)}, V/( A,)  €  A«.  We 
notice  the  following  properties  of  the  feasible  definitions  of  F. 

First,  for  the  situation  in  which  Vx  =  {/(A2)},  the  number  of  feasible  definitions 
we  need  to  check  is  bounded  by  0(A:logA:)  time.  In  the  following  discussion,  we  assume 
Vx  =  {/(A,)},  where  i  ^  2. 

Second,  we  do  not  need  to  consider  those  situations  where  Vx  =  {/(A,)},  for  which 
/(Vi)  €  By-  Assume  V'x  =  {/(A;)},  for  which  i  ^  2  and  /(V;)  €  fly.  We  have  /( A2)  + 
/(V2)  >  a*  >  >  /(V,)  +  /(V2)  +  /(Vi),  therefore  /( Aa)  >  /(>',)  +  /(V)).  We  also  have 

/(A2)  +  /(V2)  >  a’  >  oj  >  /(A2)  +  /(A,),  therefore  /(V2)  >  /(Ai).  We  can  show  that 
m-/(A,)  +  n-/(V)  >  /(Ai)  +  /(Vi ),  because  /(A,)  <  m-/(A2)  <  m-/(V,)-/(V)  < 
m- /(Vj)- /(V)  +  n-/(A,).  We  can  then  show  that  m-/(A,)  +  n-/(Vt)  >  /(A 2)  +  /(  V2), 
because  /(A2)  +  /(Ai)  <  a*  <  /(A,)  +  /(V, )  <  /(A, )  +  /( A2)  -  /(V,)  <  m  -  /(V-)  <  m  ~ 
min{/(V2),/(V)}  <  m-min{/(V2),/(V)  +  n-max{/(V2),/(V)}  =  m  +  n-/(V2)~/(V). 
This  means  that  m  -  /(A,)  +  /(Vj)  >  m~  f(Xi)  +  n~  f(Yt)  >  max{/(Aj )  +  /( Vi ),  /(A'2)  + 
/(V2)},  which  indicates  that  whether  we  switch  f(Y{)  or  not  the  resulting  assignment  is 
always  worse  than  the  original  assignment  without  any  switches. 

Taking  the  above  facts  into  account,  we  only  need  to  consider  the  following  feasible 
definitions  of  F.  Without  loss  of  generality,  assume  h  =  1  or  2,  where  f{Xh)  +  /(V,)  = 
max{/(A'j)  +  /(Vi),/(A2)  +  /(V2)}.  This  means  that  if  \Uy\  =  1  then  Uy  =  {/(V/,)}. 


# 

Uy 

VY 

Time 

1 

4> 

0(k ) 

2 

{/(n)} 

<t> 

0(k) 

3 

{/(n)> 

{f(Yj)h*m)eYR,jjti 

O(k\ogk) 

4 

{/(Vi),/(V2)} 

<t> 

0(k) 

5 

{/(n),/(V2)} 

{f(m^f{Y0)e  YrJ^i 

0(k\agk ) 

6 

{/(V,),/(V2)} 

0(k  log  k) 

Similar  to  the  previous  subcases,  the  number  of  situations  we  need  to  check  in  this 
subcase  is  also  bounded  by  0(k]ogk). 

Case  3:  |fly|  =  2. 

By  Lemma  7,  |£>x|  <  1  and  \Dy\  <  1,  and  if  there  is  a  disastrous  choice  in  /x,  there  is 
also  a  disastrous  choice  in  fy.  We  consider  two  subcases:  (a)  j£>x|  =  0  and  |Oy]  =  1;  and 
(b)  |DX|  =  1  and  \Dy  \  —  1. 
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(a)  If  Dx  -  4>,  and  Dy  —  {/(Ti)},  then  by  Lemma  7  /(Aj)  £  A 'h,  and  VV,  if  nonempty, 
only  contains  f(Yt)  with  /  ( X, )  £  Bx  •  Otherwise,  /(Ajj+Ji-/(Vjj  >  m- /( A',)-i  n-/(Vj)  > 
/(ATi)  +  /(Vj ),  which  implies  that  the  resulting  a  is  even  larger  than  that  for  /.  So  there 
is  no  potential  switches  in  /*■ .  Consider  the  following  feasible  definitions  of  F. 


# 

vx 

Vy 

vv 

Time 

1 

4> 

4> 

4> 

d> 

0(1) 

2 

<t> 

<t> 

{/(Vj)} 

<t> 

0(1) 

3 

4> 

4> 

{/(>))} 

{/(Vj)},v/(Vj)  £  yfi  With  /(A,)  £  Bx 

0(*) 

(b)  If  Dx  =  {/(Ai)},  and  =  {/(Ti)},  then  we  notice  the  following  properties  of  the 
feasible  definitions  of  F. 

First,  we  do  not  have  to  consider  the  situation  in  which  both  f(X\)  and  /(Vj)  are 
switched.  Because  assuming  /(A 2)  is  the  other  bad  choice  in  fx ,  m  -  /(A*)  +  n  -  /(Vj  j  < 
/(A-!)  +  n  -  /(V'i)  <  /( A'i)  +  /(A2)  <  o',  which  suggests  that  switching  just  f(Y\)  is 
already  good  enough,  why  bother  to  switch  both  /(A’i)  and  /(Vj)? 

Second,  \Ux\  <  1.  Assume  Ux  =  {/(Ai), /(A2)}.  This  case  happens  only  when 
f(Y2)  £  VY,  /( A3)  €  V*  for  some  f(Y3)  6  f/y,  and  /(A2)  +  »  -  /(K2)  >  ft2(f/y,  W)  = 
o2  +  n  -  /(y3)  -  /(y2).  Then  f(Yx)  +  /(y3)  <  05  <  /(A2)  +  f{Yz).  So  f(Yy)  <  f( X2).  On 
the  other  hand,  f(X\)  4-  /(Ti)  >  a*  >  /(Ai)  +  /( A2).  So  /(Ti)  >  /(A2).  A  contradiction! 

Taking  the  above  facts  into  account,  we  only  need  to  consider  the  following  feasible 
definitions  of  F. 


# 

Ux 

Kv 

Vy 

vv 

Time 

1 

<t> 

<J> 

<t> 

4> 

0(1) 

2 

<t> 

4> 

{/(Vj)} 

4> 

0(1) 

3 

<t> 

<t> 

{/(Vj)} 

{/(y,)},v/(yt)  £  yR 

0(fc) 

4 

mm 

<t> 

{/(Vj)} 

{/(Vj)} 

0(1) 

5 

{f(X  2)} 

{/(Ai)},V/(A,)  £  Ah 

{/(Vj)} 

{/(Vj)} 

o(*) 

6 

{/(-*’,)> 

<t> 

0(1) 

7 

if(x  1)} 

{/(A,)},V/(Ai)  £  Ah 

4> 

<t> 

0(fc) 

8 

</(*!)} 

{/(A,)},V/(Ai)£  Ah 

{/(Vj)} 

4> 

0(fc) 

9 

{/(X  t)} 

{/(Ai)},V/(Ai)£  Ah 

{/(Vj)} 

{/(vj)},v/(y,)  £  yR 

o(*) 

We  check  the  fourth  and  fifth  situations  only  when  /(V2)  €  Vh>  and  /(A’2)  +  ?i~/(y2)  > 
a\(Uy,  Vy)  =  ft2  +  n  -  /(Vj )  —  /(Vj).  In  the  eighth  and  ninth  situations,  we  only  need  to 
check  those  combinations  with  Vx  =  {/(A'i)},  for  which  f{Y{)  £  By  and  m-/(A,)+ /(Vj)  > 
«i (Ux,  Vx)  =  ft}  +  m  —  /( Aj)- /(A,-).  We  shall  prove  that  there  is  at  most  one  such  /(A'i). 
Assume  there  are  two,  say,  /(Ai)  and  f(Xj).  Then  m  -  /(A,)  -f  /(Vj)  >  ft}  +  m  -  /( A'i )  - 
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/( A'i).  So  f(X\)  +  f(X2)  <  a*  <  /( A,)  +  /(X).  Therefore  /(A2)  <  f{Yt).  Similarly,  we 
have  f(Xi)  <  f{Yj)-  However,  /( X i )  +  /( V, >  >  o*  >  «2  >  /( X)  +  /(X)  +  f(Yj)-  We  have 
to  >  /(.V])  >  f{Yt)  +  f(Yj)  >  2/(.Y2).  So  f(X2)  <  y.  which  is  a  contradiction  to  that 
/( A'2)  is  a  bad  choice.  In  the  eighth  situation,  we  spend  0(£)  to  find  the  /(A’,)  6  Xr,  if 
it  exists,  and  0(1)  to  check  the  corresponding  situation.  In  the  ninth  situation,  we  spend 
0(k)  to  find  the  /(A,)  €  Xr ,  if  it  exists,  and  spend  0{k)  to  check  the  feasible  definitions 
with  Vy  =  {f(Yj)},Vf(Yj)  e  Yr. 

5  Conclusion 

This  paper  studies  a  problem  of  routing  messages  on  an  SIMD  parallel  architecture  whose 
processing  elements  (PE)  are  connected  as  a  toroidal  mesh.  In  our  problem  the  sets  of 
messages  processors  send  are  isomorphic,  meaning  that  if  some  processor  i  has  a  message  i -o 
send  which  must  traverse  Xi  PEs  in  the  East-West  dimension  and  yx  PEs  in  the  North-South, 
then  all  PEs  have  a  message  to  send  with  identical  routing  offsets.  We  examine  variants  of 
the  problem  having  differing  assumptions  concerning  simultaneous  use  of  communication 
channels,  and  the  ability  to  buffer  a  message  temporarily  en-route.  Our  solution  approach 
is  to  view  the  problem  as  a  scheduling  problem,  related  to  a  previously  studied  open  shop 
scheduling  problem.  Our  results  provide  new  results  not  only  on  the  motivating  routing 
problem,  but  on  a  new  class  of  scheduling  problems  as  well. 

A  spectrum  of  complexities  are  obtained,  from  linear  in  the  number  of  messages  (fc)  per 
processor  to  NP-complete.  The  variation  where  all  ports  may  be  used  simultaneously  and 
messages  may  be  buffered  en-route  is  of  particular  interest;  we  first  show  quickly  why  the 
problem  has  a  polynomial  solution,  and  then  do  an  extensive  case  analysis  to  show  that 
the  complexity  is  0(k[ogk).  The  case  analysis  lacks  elegance;  our  hope  is  that  future  work 
may  provide  a  more  direct  solution  to  the  problem. 


Appendix 

Lemma  1  If\Bx\  >  3,  then  F  —  } . 

Proof.  We  shall  prove  that  Dx  =  <t>  and  Dy  =  <j>-  Suppose  that  f(X i),  /(AT2),  f(X3),. . . 
are  the  bad  choices  in  fx,  and  that  f(Xi)  is  the  largest  among  all.  Assume  that  there  is  at 
least  one  disastrous  choice  in  fx  (alt.,  fy),  say,  /(A,)  (alt.,  /(Vi)).  Then  /(A,)  +  /(Xi)  > 
«*>«;>  /(A,)  +  /( A2)  +  /( A3).  So  f(Yx)  >  (/(AO  -  /(A,))  +  /(A2)  +  /(A3)  > 
/( A2)  +  f(X3)  >  2  x  |  =  to,  which  is  impossible.  ■ 


Lemma  2  \Dy\  <  2. 
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Proof.  Assume  Dy  -  {f(Y\ ),  f(Y2),  /(V3), . . .}.  Then  at  least  two  of  /( A'j),  f(X2), 
/(A'3)  are  in  the  same  column  in  the  assignment  diagram  for  fx ,  say,  f(X\)  and  f{X2). 
Since  both  /(Y'i)  and  f{Yi)  are  disastrous  choices,  we  have  f{X\)  +  /(Y'i)  >  a*,  arid 
f(X 2)  +  f(Y2)  >  a*,  therefore, 

f(Xl)  +  f(Yl)  +  f(X2)  +  f(Y2)>  2a  . 

However,  we  know  f(X\)  +  f(X 2)  <  aj  <  a’,  and  /(V’i)  +  f{Yi)  <  a£  <  o*.  th^r^fore, 

f(Xl)Yf(X2)  +  f(Yi)  +  f(Y2)<2a. 


This  is  a  contradiction. 


■ 


LEMMA  2  \UX\  >\VX\  aad\UY\  >  |W|. 

Proof.  We  only  prove  \Ux\  >  |Vx|,  since  the  proof  of  |f/y|  <  | VV (  *s  totally  symmetric 
and  hence  can  be  omitted.  For  notational  simplicity,  we  ignore  th  subscript  X  in  the 
discussion  below. 

Assume  \U\  <  |V|.  Define  any  V'  C  V  with  |V"|  =  \U\.  Let  a5(£/,  V)  be  the  corre¬ 
sponding  <*1  resulting  from  switches  in  U  and  V ,  and  a\(U,V')  be  the  corresponding  aj 
resulting  from  switches  in  U  and  V' .  Let  a*  =  max{a|,a/},  where  a|  =  YL}(X,)=i, 
and  aj  =  53/(X,)=7n-x,(m  “  xi )• 

a,(U,  V)  =  max{«+  -  Et/  /(*.)  +  Zv(™  -  /(*,)),  «/  -  f(X,)  +  Ey(m  -  f(X,))} 

=  ‘*t-Eu/m  +  Ey(m-f(Xi)) 

(Since  a]*-  -  a/  >  m(\U\  -  |V’|),) 

ai(UiVf)  =  m9x{at-Zuf(Xi)  +  j:v,(m~f(Xi))ta;-Zv  f(X,)  +  Eu(m~f( *<))} 

=  <*mt-Zuf{Xi)  +  i:v'{m-f(Xi)) 

<  -  El/ /TO +  £v(™ -/(*.)) 

(Since  a,-  -  of  <  £yr_v"(m  ~  f(Xi))  if  a/  >  af .) 

=  on  (U,V). 

Therefore,  ai(U,  V')  <  ai (U,V),  and  it  has  fewer  bad  choices.  Why  not  try  at((7,  V')l 
In  other  words,  the  choices  in  V  —  V'  do  not  have  to  be  switched  to  the  opposite  column 
since  this  does  not  lower  a5 ,  and  instead  creates  some  new  bad  choices.  I 


Lemma  4  All  members  of  Ux  and  Uy  are  potential  switches. 

Proof.  As  declared  earlier,  we  only  prove  the  lemma  for  Ux,  and  omit  the  subscript  X . 
If  \U\  =  |V|,  and  U  contains  some  choices  that  are  not  potential  switches,  let  V  be  the  set 
of  potential  switches  in  U ,  and  V  bp  any  subset  of  V  with  \V'\  =  j(/'|. 
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a,(U,  V)  =  ma x{a+  -  Et;  /TO  +  Ef(™  ~  /TOW  -  Ev  /TO  +  Zu(™  -  /TO)) 

W'E  uf(X,)  +  Lv(m-f(Xt)) 

<*\{U',V)  =  max{a*-Et/'  /TO  +  Ev-W/TO  W-Ev-  /TO  +  Ei/'(m~/TO)} 
^-Et/'/TO  +  Ev'TO/TO) 

<«I-Ei//TO  +  Ev(m-/TO) 

{Since  Eu-wf(Xi)  <  J2v-v(m  -  /(A,))-) 

=  <*i(U,V). 

Therefore,  a\(U',V')  <  «,(£/,  V),  and  U'  does  not  contain  any  unnecessary  switches. 

If  |t/j  >  |V|,  and  U  contains  some  choices  that  are  not  potential  switches,  let  U'  be  the 
set  of  potential  switches  in  U ,  and  V'  be  any  subset  of  V  with  |V"|  =  max{0,  |V7|  —  \U  ~  U' |}. 

0,(1/,  V)  =  max{a+  -  Zu  /TO  +  Zv(™  -  f(Xi),af  -  Ey  /TO  +  El /(m  -  /(X,))} 

W-Ev/TO  +  Ei/TO/TO) 

(Since  af  -  af  <  m(\U\  -  | Kj).) 

a,(f//,n  =  maxK-Et;-  /TO  +  Ev-TO/TOW  "Ev"  /TO  +  Ei/'W/TO)} 

=  «r  -  Ev-  /to  +  Zu'(m  -  f(Xi)) 

(Since  af  -  oj"  <  m(|f/'|  -  jWJ).) 

W'Ev/TO  +  E  u(m-f(Xi)) 

(Since  Ev-v"  /(A',-)  <  Zu-iA™  -  /(X,)).) 

=  a,(J/,n 

Therefore,  a, (U1,  V')  <  a\{U ,  K),  and  U'  does  not  contain  any  unnecessary  switches.  ■ 
Lemma  5  If\Bx\  —  0,  then  \Ux\  —  |Vx[  =  0,  and  \Dy\  5-  2. 

Proof.  By  Lemma  3  and  Lemma  4,  |V*|  fz  Wx\  —  0-  By  Lemma  2,  \Dy\  <2.  ■ 

Lemma  6  If  \Bx\  =  1,  then  |Vx|  <  \Ux\  <  1  and  \Dy\  <  2.  Furthermore,  if  Dy  = 
{f(Yx),f(Y2)},  then  a*  <  /(A,)  +  /(A2)  and  one  of  /(A-,)  and  /(A2)  is  ihe  largest  bad 
choice  in  fx  ■ 

Proof.  Assume  a*  >  /(A,)  +  /(A2),  then  /(A,)  +  /(T,)  >  a’  >  /(A,)  +  /(A*).  So 

/(y,)  >  /(A2).  On  the  other  hand,  /(A2)  +  /(T2)  >  a*  >  a2  >  /(V,)  +  /(V2).  So 

/( A2)  >  /(V,).  A  contradiction! 

We  notice  that  when  a*  <  /(A,)  +  /(A2),  /(A,)  and  /( A2)  are  in  the  different  columns, 
and  one  of  them,  say,  /(A,),  has  to  be  the  largest  bad  choice  in  fx-  ■ 

Lemma  7  If\Bx\  -  2,  then  |Z?x|  <  1  and  \Dy\  <  1.  Furthermore,  if  Dx  =  {/(A,)},  then 
Dy  =  if(Yi)};  if  Dy  =  {/( Vi )}  and  /(A,)  $  Bx,  then  /(A,)  is  in  the  right  column  of  the 
assignment  diagram  of  fx  ■ 


Proof.  Suppose  that  /(AT,)  and  f(Xj)  are  two  bad  choices  in  fx  ■  First,  we  notice  that  if 
f(X{)  (l  =  i  or  j)  is  a  disastrous  choice,  /(V;)  is  also  a  disastrous  choice,  because  f(Xi)  + 
f(Yi)  >  a*  >  at  >  f{Xt )  +  f(X3),  and  therefore  /(V,)  >  f  >  f . 

Assume  that  there  are  at  least  two  disastrous  choices  in  fx-  They  must  be  f(Xt)  and 
f(Xj).  Since  both  f(Yi)  and  /(Vj)  are  disastrous  choices,  they  are  in  the  same  column  in 
assignment  diagram  of  /y.  Then  /( A\)  +  /(Vi)  >  a*  >  a£  >  f(Y,)  +  f(Yj).  So  f(Xi)  > 
f(Yj).  On  the  other  hand,  /( Xj)  +  f(Y})  >  a*  >  a{  >  f(Xt)  +  /(X/).  So  f(Yf)  >  f(X,). 
A  contradiction! 

Assume  that  there  are  at  least  two  disastrous  choices  in  /y,  say,  f{Y\ )  and  f{Y2).  Then 
f(X\)  +  f(Y\)  >  a*  >  a2  >  f{Y\)  +  /(V2).  So  /(A/)  >  f(Y2).  On  the  other  hand, 
f(X2)  +  /(V2)  >  a*  >  a t  >  f(Xi)  +  f(X})  >  /(*,)  +  f(X2).  So  f(Y2)  >  /(*,).  A 
contradiction! 

If  DX  =  {/(X,)},  then  £>y  =  {/(V,)}.  If  Dy  =  {/(Vj)}  and  /(A',)  0  fly,  then  /( A'i ) 
must  be  in  the  right  column.  Otherwise,  f(X\)  +  f(Y\)  >  /(A’,)  +  f{Xj)  +  f(X\).  So 
/(Vj)  >  f(Xi )  -f  f(Xj)  >  m,  which  is  impossible.  ■ 
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